Our team explores MLB batting performance indicators
using data from Baseball Savant.
Decision-maker / Context: Baseball analysts and coaches
interested in understanding player performance and team-level KPIs.
Research Question: Which batting metrics are most
predictive of overall offensive value, and how does player/team
performance over time indicate future success?
Impact: Insights could guide player evaluation, team
decisions, and strategy planning.
Primary Datasets:
MLB team stats 2021-2025.csv
MLB player stats 2021-2025.csv
Columns / Metrics:
Players: player info, plate appearances (PA), batting stats, KPIs (OBP, SLG, OPS, ISO, BABIP, RBI, xBA, xSLG, exit velocity, etc.)
Teams: Win%, SLG, OBP, RBI, ISO, OPS
Data characteristics: Mixed continuous metrics, percentages, and categorical identifiers (player, year).
Summary statistics:
These plots show projected team-level KPIs for the next three seasons of top 10 teams based on linear regression with correlated predictors.
Top Teams Selected: Based on 2025 Win%
Graphs and Captions:
Projected Win Percentage (2026–2028)
Description: Projected win percentage for top MLB teams. Shows
expected growth/decline in team performance over three years.
Projected SLG (2026–2028)
Description: Shows projected slugging performance of top teams.
Patterns indicate which teams may improve power hitting.
Projected OBP (2026–2028)
Description: Team on-base percentage projection, reflecting
plate discipline and consistency.
Projected RBI (2026–2028)
Description: Estimated run production by team, linked to
scoring potential.
Projected ISO (2026–2028)
Description: Measures team isolated power; highlights teams
likely to hit extra-base hits.
Projected OPS (2026–2028)
Description: Combined on-base plus slugging metric, giving a
broad measure of offensive efficiency.
These plots show projected player-level KPIs for the top 10 players based on prior performance and correlated metrics.
Graphs and Captions:
Projected SLG for Top Players
Description: Expected slugging trends for top players.
Highlights consistency and potential breakout performers.
Projected OPS for Top Players
Description: Combines OBP and SLG for a holistic view of player
offensive output.
Projected ISO for Top Players
Description: Player isolated power trends, indicating
extra-base hitting capability.
Projected Batting Average for Top Players
Description: Tracks expected hit rate per at-bat for top
players.
Projected OBP for Top Players
Description: Player on-base percentage projections, reflecting
consistency and plate discipline.
Projected RBI for Top Players
Description: Expected run production per player, linked to
scoring potential.
Projected Exit Velocity for Top Players
Description: Exit velocity trend predictions; higher values
often correlate with power hitting.
These plots show relationships between exit velocity and player hitting stats, and overall correlations among KPIs.
EV vs SLG:
EV vs RBI:
EV vs OPS:
EV vs OBP:
EV vs ISO:
EV Correlation Heatmap vs Hitting Stats:
Team Rankings Heatmap (2021–2025 averages):
Our analysis combined historical MLB player- and team-level statistics with regression-based projections to estimate future offensive performance for 2026–2028. By examining KPIs such as SLG, OBP, OPS, ISO, RBI, Batting Average, and Exit Velocity, we generated insights into which metrics are most indicative of future success.
Key Findings:
OPS and ISO consistently emerge as the strongest indicators of future offensive performance at both the team and player levels. Teams and players maintaining strong OPS trends also show positive projections in Win%, RBI, and SLG.
Exit Velocity has moderate correlation with power metrics (ISO ≈ 0.65, SLG ≈ 0.62), but does not strongly correlate with most KPIs. → This means EV is useful, but not a primary predictor in your dataset.
Team Projections (2026–2028):
Player Projections (2026–2028):
Heatmaps and Scatter Plots:
Limitations:
The model only uses batting statistics; defensive and situational factors are not included.
Projections are based on linear regression, which may not capture sudden changes (injuries, role changes, coaching changes, etc.).
Future Work:
Add park effects, defensive WAR, sprint speed, pitch-level statcast features, injury history, and multivariate models.
Use machine learning methods for improved forecasting accuracy.
Hypothetical Decision:
Data-driven Recommendation:
Benefits:
Risks:
output/.Group_3_Checkpoint_2.0
branch regularly.Challenges:
Victories:
| Member | Role | Contribution |
|---|---|---|
| Matthew G Gonzalez | Project Lead / Co Head Developer | Data cleansing + Writing codes + visualization + findings and write-up |
| Jacob D Lamothe | Code Editor/Checker + Video Editor + Presentation/Narration Lead | Checks code for mistakes/redundancies + statistical validation + Edits video at the end of project |
| Rodolfo Lazaro | Visualization Designer | Tableau plots + checking visualizations |
| Samir Soliman | Head Developer | Import data + write codes + statistical Validation/Model Evaluation + findings and write-up |
We selected Option 1 — Baseball Savant MLB performance indicators.
Repository created.Team are collaborating on GitHub (Commit → Pull → Push workflow is currently used)
Data downloaded from baseballsavant.mlb.com
Python & Tableau are currently used for analysis and visualizations: Scatter Plots for relationships between variables, Line Plots to show performance trends over years, Boxplots to visualize distributions of key metrics & Histograms for frequency distributions of variables.
The team is working on KPIs and forecast: On-Base Percentage (OBP), Slugging Percentage (SLG), Isolated power, Batting avg. & Exit Velocity
This README.md is Checkpoint 2 deliverable.
The recorded video will show results with narration.
The final report will be displayed in the README.md file on the repo landing page.